rethinking and ulam() are great for learning
brms package
tidybayes, bayestestR)Snakes datatibble [112 × 4] (S3: tbl_df/tbl/data.frame)
$ ForagingMode: Factor w/ 2 levels "Active","Ambush": 2 1 2 2 1 2 2 2 2 1 ...
$ SVL : num [1:112] 83.7 48.7 174.2 44 87.3 ...
$ RPM : num [1:112] 0.301 0.33 0.075 0.33 0.0237 0.19 0.33 0.27 0.19 0.12 ...
$ log10SVL : num [1:112] 1.92 1.69 2.24 1.64 1.94 ...
brm()lm() model syntaxb parameters: intercepts and slopes
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RPM ~ ForagingMode + log10SVL - 1
Data: Snakes (Number of observations: 112)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ForagingModeActive 0.13 0.10 -0.07 0.32 1.00 1950 2475
ForagingModeAmbush 0.22 0.10 0.02 0.43 1.00 1971 2571
log10SVL 0.01 0.05 -0.10 0.12 1.00 1921 2542
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.12 0.01 0.10 0.14 1.00 3610 3682
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Family: gaussian
Links: mu = identity; sigma = identity
Formula: RPM ~ ForagingMode * log10SVL
Data: Snakes (Number of observations: 112)
Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
total post-warmup draws = 10000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept 0.32 0.14 0.05 0.59 1.00 3897
ForagingModeAmbush -0.28 0.19 -0.64 0.09 1.00 3185
log10SVL -0.10 0.07 -0.25 0.05 1.00 3829
ForagingModeAmbush:log10SVL 0.20 0.10 0.00 0.40 1.00 3142
Tail_ESS
Intercept 4981
ForagingModeAmbush 4325
log10SVL 4912
ForagingModeAmbush:log10SVL 4162
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.12 0.01 0.10 0.13 1.00 6046 5426
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
bayesplot has many functions for working with models.
Plot the distribution of RPM and a sample of 100 draws from the posterior (\(y_{rep}\))
Plot histograms of samples from the posteriors separately for ForagingMode with the mean of RPM for each.
Three approaches
More context and practice in Unit 5
Predict across a range of log10SVL for each level of ForagingMode.
pred_values <- crossing(
log10SVL = seq(1.3, 2.5, length.out = 100),
ForagingMode = levels(Snakes$ForagingMode)
)
pred_values# A tibble: 200 × 2
log10SVL ForagingMode
<dbl> <chr>
1 1.3 Active
2 1.3 Ambush
3 1.31 Active
4 1.31 Ambush
5 1.32 Active
6 1.32 Ambush
7 1.34 Active
8 1.34 Ambush
9 1.35 Active
10 1.35 Ambush
# … with 190 more rows
posterior_epred()posterior_predict() num [1:10000, 1:200] 0.161 0.152 0.17 0.158 0.163 ...
log10SVL and ForagingMode.pred_values.apply() quantile() across the columns (MARGIN = 2) of p_pred# A tibble: 200 × 5
log10SVL ForagingMode Q50 Q5.5 Q94.5
<dbl> <chr> <dbl> <dbl> <dbl>
1 1.3 Active 0.137 0.0885 0.187
2 1.3 Ambush 0.231 0.172 0.289
3 1.31 Active 0.137 0.0894 0.186
4 1.31 Ambush 0.231 0.173 0.288
5 1.32 Active 0.137 0.0905 0.185
6 1.32 Ambush 0.231 0.174 0.288
7 1.34 Active 0.138 0.0915 0.185
8 1.34 Ambush 0.231 0.175 0.287
9 1.35 Active 0.138 0.0925 0.184
10 1.35 Ambush 0.231 0.176 0.286
# … with 190 more rows
ggplot() +
geom_point(data = Snakes, aes(log10SVL, RPM, color = ForagingMode),
size = 3) +
geom_ribbon(data = pred_values,
aes(x = log10SVL, ymin = Q5.5, ymax = Q94.5,
fill = ForagingMode), alpha = 0.25) +
geom_line(data = pred_values,
aes(x = log10SVL, y = Q50, color = ForagingMode)) +
scale_color_manual(values = c("red", "blue")) +
scale_fill_manual(values = c("red", "blue"))Highest Density Interval
Parameter | 89% HDI
------------------------------------
b_ForagingModeActive | [-0.03, 0.28]
b_ForagingModeAmbush | [ 0.06, 0.39]
b_log10SVL | [-0.08, 0.09]
sigma | [ 0.11, 0.13]
Equal-Tailed Interval
Parameter | 89% ETI
------------------------------------
b_ForagingModeActive | [-0.03, 0.28]
b_ForagingModeAmbush | [ 0.06, 0.39]
b_log10SVL | [-0.08, 0.09]
sigma | [ 0.11, 0.13]
Use mutate() to calculate the difference for each sample
# A tibble: 10,000 × 4
b_ForagingModeActive b_ForagingModeAmbush b_log10SVL d
<dbl> <dbl> <dbl> <dbl>
1 0.166 0.235 -0.00413 0.0687
2 0.197 0.309 -0.0344 0.112
3 0.223 0.301 -0.0403 0.0785
4 0.209 0.320 -0.0388 0.111
5 0.215 0.310 -0.0399 0.0944
6 0.226 0.339 -0.0512 0.113
7 0.150 0.260 -0.0109 0.110
8 0.149 0.272 -0.00596 0.123
9 0.155 0.240 -0.0112 0.0850
10 0.00531 0.0814 0.0719 0.0761
# … with 9,990 more rows
“A difference of zero plus or minus some small amount is for practical purposes ‘not different’ from zero.”
Easystats has a good introduction. Additional reading:
rope_range() to automatically find the most suitable range for comparison.
Stan uses an approximation of the LOO-CV (Gelman et al. 2014; Vehtari et al. 2017; Bürkner et al. 2021)
Distribute the expected predictive ability across 100%.
More discussion, detail, examples in Unit 5.